Learning Ensembles from Bites: A Scalable and Accurate Approach

نویسندگان

  • Nitesh V. Chawla
  • Lawrence O. Hall
  • Kevin W. Bowyer
  • W. Philip Kegelmeyer
چکیده

Bagging and boosting are two popular ensemble methods that typically achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the dataset can be a bottleneck. Voting many classifiers built on small subsets of data (“pasting small votes”) is a promising approach for learning from massive datasets, one that can utilize the power of boosting and bagging. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New WordNet Enriched Content-Collaborative Recommender System

The recommender systems are models that are to predict the potential interests of users among a number of items. These systems are widespread and they have many applications in real-world. These systems are generally based on one of two structural types: collaborative filtering and content filtering. There are some systems which are based on both of them. These systems are named hybrid recommen...

متن کامل

Novel Methods for the Feature Subset Ensembles Approach

Ensemble learning technique attracted much attention in the past few years. Instead of using a single prediction model, this approach utilizes a number of diverse accurate prediction models to do the job. Many methods have been proposed to build such accurate diverse ensembles, of which bagging and boosting were the most popular. Another method, called Feature Subset Ensembles FSE, is thoroughl...

متن کامل

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Google’s...

متن کامل

Novel Methods for the Feature Subset Ensemble Approach

Ensemble learning technique attracted much attention in the past few years. Instead of using a single prediction model, this approach utilizes a number of diverse accurate prediction models to do the job. Many methods have been proposed to build such accurate diverse ensembles, of which bagging and boosting were the most popular. Another method, called Feature Subset Ensembles (FSE), is thoroug...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2004